Corpus-oriented Acquisition of Chinese Grammar

نویسندگان

  • Yan Zhang
  • Hideki Kashioka
چکیده

The acquisition of grammar from a corpus is a challenging task in the preparation of a knowledge bank. In this paper, we discuss the extraction of Chinese grammar oriented to a restricted corpus. First, probabilistic context-free grammars (PCFG) are extracted automatically from the Penn Chinese Treebank and are regarded as the baseline rules. Then a corpusoriented grammar is developed by adding specific information including head information from the restricted corpus. Then, we describe the peculiarities and ambiguities, particularly between the phrases “PP” and “VP” in the extracted grammar. Finally, the parsing results of the utterances are used to evaluate the extracted grammar.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Acquisition And Chinese Parsing Based On Corpus

In Natural Language Processing (NLP), one key problem is how to design a robust and effective parsing system. In this paper, we will introduce a co rpmbased Chinese parsing system. Our efforts are coucetrated on: (1) knowledge acquisition and representation; and (2) the parsing scheme. The knowledge of this system is principally extracted from analyzed corpus, others are a few grammatical princ...

متن کامل

Incorporating Cognitive Linguistic Insights into Classrooms: the Case of Iranian Learners’ Acquisition of If-Clauses

Cognitive linguistics gives the most inclusive, consistent description of how language is organized, used and learned to date. Cognitive linguistics contains a great number of concepts that are useful to second language learners.  If-clauses in English, on the other hand, remain intriguing for foreign language learners to struggle with, due to their intrinsic intricacies. EFL grammar books are ...

متن کامل

The L2 Acquisition of the Chinese Aspect Marking

By analyzing corpus data, we have shown that the tendencies of restricting perfective past marking to Accomplishments and Achievements and imperfective marking to Statives and Activities as described by the Aspect Hypothesis (Shirai, 1991; Andersen & Shirai, 1996), undesirable in the acquisition of various languages, are desirable in the acquisition of a language like Chinese, because these ten...

متن کامل

Treebank-Based Acquisition of Chinese LFG Resources for Parsing and Generation

This thesis describes a treebank-based approach to automatically acquire robust, wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena an...

متن کامل

Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank

Automated conversion has allowed the development of wide-coverage corpora for a variety of grammar formalisms without the expense of manual annotation. Analysing new languages also tests formalisms, exposing their strengths and weaknesses. We present Chinese CCGbank, a 760,000 word corpus annotated with Combinatory Categorial Grammar (CCG) derivations, induced automatically from the Penn Chines...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005